UMMU$@$QALB-2015 Shared Task: Character and Word level SMT pipeline for Automatic Error Correction of Arabic Text

نویسندگان

  • Fethi Bougares
  • Houda Bouamor
چکیده

In this paper we present the LIUM (Laboratoire d’Informatique de l’Universit du Maine) and CMU-Q (Carnegie Mellon University in Qatar) joint submission in the Arabic shared task on automatic spelling error correction. Our best system is a sequential combination of two statistical machine translation systems (SMT) trained on top of the MADAMIRA output. The first is a Character-based one, used to produce a first correction at the character level. Characters are then glued to form the input to the second system working at the Word level. This sequential combination achieves an F1 score of (69.42) that is better than the best F1 score reported on the 2014 test set (67.91). The UMMU best submission to the QALB-15 shared task is ranked first over 10 submission on the L2 test condition and second over 12 submission on the L1 testsset.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Pipeline Approach to Supervised Error Correction for the QALB-2014 Shared Task

This paper describes our submission to the ANLP-2014 shared task on automatic Arabic error correction. We present a pipeline approach integrating an error detection model, a combination of characterand word-level translation models, a reranking model and a punctuation insertion model. We achieve an F1 score of 62.8% on the development set of the QALB corpus, and 58.6% on the official test set.

متن کامل

QCRI$@$QALB-2015 Shared Task: Correction of Arabic Text for Native and Non-Native Speakers' Errors

This paper describes the error correction model that we used for the QALB2015 Automatic Correction of Arabic Text shared task. We employed a case-specific correction approach that handles specific error types such as dialectal word substitution and word splits and merges with the aid of a language model. We also applied corrections that are specific to second language learners that handle erron...

متن کامل

CMUQ$@$QALB-2014: An SMT-based System for Automatic Arabic Error Correction

In this paper, we describe the CMUQ system we submitted to The ANLP-QALB 2014 Shared Task on Automatic Text Correction for Arabic. Our system combines rule-based linguistic techniques with statistical language modeling techniques and machine translationbased methods. Our system outperforms the baseline and reaches an F-score of 65.42% on the test set of QALB corpus. This ranks us 3rd in the com...

متن کامل

GWU-HASP: Hybrid Arabic Spelling and Punctuation Corrector

In this paper, we describe our Hybrid Arabic Spelling and Punctuation Corrector (HASP). HASP was one of the systems participating in the QALB-2014 Shared Task on Arabic Error Correction. The system uses a CRF (Conditional Random Fields) classifier for correcting punctuation errors, an open-source dictionary (or word list) for detecting errors and generating and filtering candidates, an n-gram l...

متن کامل

The Second QALB Shared Task on Automatic Text Correction for Arabic

We present a summary of QALB-2015, the second shared task on automatic text correction of Arabic texts. The shared task extends QALB-2014, which focused on correcting errors in Arabic texts produced by native speakers of Arabic. The competition this year, in addition to native data, includes texts produced by learners of Arabic as a foreign language. The report includes an overview of the QALB ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015